Trash article detection using categorization techniques

نویسندگان

  • Christos Bouras
  • Vassilis Tsogkas
  • Vassilis Poulopoulos
  • George Tsichritzis
چکیده

We explore techniques for detecting news articles containing invalid information, using the help of text categorization technology. The information that exists on the World Wide Web is huge enough in order to distract the users when trying to find useful information. In order to overcome the large amounts of data many methodologies of text categorization have been presented. One major problem we have to deal with is that many articles fetched by a crawler, then stored in a back-end database, and finally given as an input to a categorization subsystem, may not contain valid information for the user (trashy articles). This may lead to the user losing his trust towards the system. In this paper, we analyze the special properties of trashy news articles’ categorization that allows us to detect them and we propose a specific methodology for trash detection. Finally, we evaluate the proposed algorithm on a news categorization system and we depict the overall benefit of a trash detection mechanism on the system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rice Classification and Quality Detection Based on Sparse Coding Technique

Classification of various rice types and determination of its quality is a major issue in the scientific and commercial fields associated with modern agriculture. In recent years, various image processing techniques are used to identify different types of agricultural products. There are also various color and texture-based features in order to achieve the desired results in this area. In this ...

متن کامل

Trash detection system for a citrus canopy shake and catch harvester using machine vision

Automatic estimation of the amount of trash, such as branches and leaves, collected by a mechanical citrus harvester during harvesting eliminates problems in the processing plants with handling diseased leaves and fruit. A machine vision system was developed to estimate the amount of trash collected by a citrus canopy shake and catch harvester by acquiring and analyzing the images of the harves...

متن کامل

Robotic Detection of Marine Litter Using Deep Visual Detection Models

Trash deposits in aquatic environments have a destructive effect on marine ecosystems and pose a long-term economic and environmental threat. Autonomous underwater vehicles (AUVs) could very well contribute to the solution of this problem by finding and eventually removing trash. A step towards this goal is the successful detection of trash in underwater environments. This paper evaluates a num...

متن کامل

High Speed Trash Measurements

This paper discusses the identification of trash objects in cotton using machine vision-based systems. Soft computing techniques such as neural networks and fuzzy inference systems can classify trash objects into individual categories such as bark, stick, leaf, and pepper trash types with great accuracies. High speed trash measurements, enables the implementation of these techniques for on-line...

متن کامل

Detection and Elimination of Trash using Machine Vision and Extended De-Stemmer for a Citrus Canopy Shake and Catch Harvester

The main objective of this research was to design an efficient trash removal system and quantify the amount of trash materials such as leaves and twigs, generated during harvesting by a continuous citrus canopy shake and catch harvester, and to compare the efficiency of two destemmers with different lengths. A regular de-stemmer with a set of ten 24-inch long rollers and an extended de-stemmer ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009